Use Kyuubi JDBC to Access Lakesoul's Table
Available since version 2.4.
LakeSoul implements Flink/Spark Connector.We could use Spark/Flink SQL queries towards Lakesoul by using kyuubi.
Requirements
Components | Version |
---|---|
Kyuubi | 1.8 |
Spark | 3.3 |
Flink | 1.17 |
LakeSoul | 2.6.0 |
Java | 1.8 |
The operating environment is Linux, and Spark, Flink, and Kyuubi have been installed. It is recommended to use Hadoop Yarn to run the Kyuubi Engine. Also, you could start local spark/flink cluster.
Deploy Kyuubi engines on Yarn.
Flink SQL Access Lakesoul's Table
1. Dependencies
Download LakeSoul Flink Jar from: https://github.com/lakesoul-io/LakeSoul/releases/download/v2.6.0/lakesoul-flink-1.17-2.6.0.jar
And put the jar file under $FLINK_HOME/lib
.
2. Configurations
Please set the PG parameters related to Lakesoul according to this document: Setup Metadata Database Connection for Flink
After this, you could start flink session cluster or application as usual.
3. LakeSoul Operations
Use Kyuubi beeline to connect Flink SQL Engine:
$KYUUBI_HOME/bin/beeline -u 'jdbc:hive2://localhost:10009/default;user=admin;?kyuubi.engine.type=FLINK_SQL'
Flink SQL Access Lakesoul :
create catalog lakesoul with('type'='lakesoul');
use catalog lakesoul;
use `default`;
create table if not exists test_lakesoul_table_v1 (`id` INT, name STRING, score INT,`date` STRING,region STRING, PRIMARY KEY (`id`,`name`) NOT ENFORCED ) PARTITIONED BY (`region`,`date`) WITH ( 'connector'='lakeSoul', 'use_cdc'='true','format'='lakesoul', 'path'='hdfs:///lakesoul-test-bucket/default/test_lakesoul_table_v1/', 'hashBucketNum'='4');
insert into `lakesoul`.`default`.test_lakesoul_table_v1 values (1,'AAA', 100, '2023-05-11', 'China');
insert into `lakesoul`.`default`.test_lakesoul_table_v1 values (2,'BBB', 100, '2023-05-11', 'China');
insert into `lakesoul`.`default`.test_lakesoul_table_v1 values (3,'AAA', 98, '2023-05-10', 'China');
select * from `lakesoul`.`default`.test_lakesoul_table_v1 limit 1;
drop table `lakesoul`.`default`.test_lakesoul_table_v1;
You could replace the location schema from hdfs://
to file://
.
More details about Flink SQL with LakeSoul refer to : Flink Lakesoul Connector
Spark SQL Access Lakesoul's Table
1. Dependencies
Download LakeSoul Spark Jar from: https://github.com/lakesoul-io/LakeSoul/releases/download/v2.6.0/lakesoul-spark-3.3-2.6.0.jar
And put the jar file under $SPARK_HOME/jars
.
2. Configurations
-
Please set the PG parameters related to Lakesoul according to this document: Setup Metadata Database Connection for Spark
-
Add spark conf to
$SPARK_CONF_DIR/spark-defaults.conf
spark.sql.extensions=com.dmetasoul.lakesoul.sql.LakeSoulSparkSessionExtension
spark.sql.catalog.lakesoul=org.apache.spark.sql.lakesoul.catalog.LakeSoulCatalog
spark.sql.defaultCatalog=lakesoul
spark.sql.caseSensitive=false
spark.sql.legacy.parquet.nanosAsLong=false
3. LakeSoul Operations
Use Kyuubi beeline to connect Spark SQL Engine:
$KYUUBI_HOME/bin/beeline -u 'jdbc:hive2://localhost:10009/default;user=admin;?kyuubi.engine.type=SPARK_SQL'
Spark SQL Access Lakesoul :
use default;
create table if not exists test_lakesoul_table_v2 (id INT, name STRING, score INT, date STRING,region STRING) USING lakesoul PARTITIONED BY (region,date) LOCATION 'hdfs:///lakesoul-test-bucket/default/test_lakesoul_table_v2/';
insert into test_lakesoul_table_v2 values (1,'AAA', 100, '2023-05-11', 'China');
insert into test_lakesoul_table_v2 values (2,'BBB', 100, '2023-05-11', 'China');
insert into test_lakesoul_table_v2 values (3,'AAA', 98, '2023-05-10', 'China');
select * from test_lakesoul_table_v2 limit 1;
drop table test_lakesoul_table_v2;
You could replace the location schema from hdfs://
to file://
.
More details about Spark SQL with LakeSoul refer to : Operate LakeSoulTable by Spark SQL